Multimedia capturing capabilities have become common features in portable devices. Thus, many people tend to record or capture an event, such as a music concert or a sport event, they are attending. During many occasions, there are multiple attendants capturing content from an event, whereby variations in capturing location, view, equipment, etc. result in a plurality of captured versions of the event with a high amount of variety in both the quality and the content of the captured media.
Video remixing is an application where multiple video recordings are combined in order to obtain a video mix that contains some segments selected from the plurality of video recordings. Video remixing, as such, is one of the basic manual video editing applications, for which various software products and services are already available. Furthermore, there exist automatic video remixing or editing systems, which use multiple instances of user-generated or professional recordings to automatically generate a remix that combines content from the available source content. Some automatic video remixing systems depend only on the recorded content, while others are capable of utilizing environmental context data that is recorded together with the video content. The context data may be, for example, sensor data received from a compass, an accelerometer, or a gyroscope, or global positioning system (GPS) location data.
In the existing automatic video remixing services, a remix presentation from an event, such as a music concert or a theatre play, is primarily based on the audio tracks of the source videos. Additionally, camera sensor data may be used for excluding out-of-focus or shaking video shots, and through straightforward video content analysis dark shots may be excluded and locations of interest may be determined based on information on simultaneous panning and pointing to the same region by several users.
However, in order to make clever cuts between source videos, more detailed information on what happens in the video should be obtained. For example, in a music concert to be able to cut to the singer during the chorus, the singer should be somehow identified. During the guitar solo, it would be desirable to focus on the guitarist.